Offline Word Spotting in Handwritten Documents
نویسنده
چکیده
The digitization of written human knowledge into string data has reached up to but not beyond the recognition of typeset text. This means that vast libraries of handwritten, cursive documents must be indexed and transcribed by a human—a prohibitively laborious task. This paper explores an existing technique developed in [1] and [12] for the offline indexation of historical handwritten documents. The algorithm clusters segmented word images using Dynamic Time Warping (DTW) which compares sets of time-dependent feature vectors from two images. By clustering words in an unknown document, a human only has to label the word clusters, significantly reducing the human’s workload. Currently the algorithm achieves a 40.8% classification rate using only two features. Other research from the community suggests that a classification rate of between 65% and 72% is achievable with additional features and further refinement to the algorithm.
منابع مشابه
Connected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملIndexing and Retrieval of On-line Handwritten Documents
Recent advances in on-line data capturing technologies and its widespread deployment in devices like PDAs and notebook PCs is creating large amounts of handwritten data that need to be archived and retrieved efficiently. Word-spotting, which is based on a direct comparison of a handwritten keyword to words in the document, is commonly used for indexing and retrieval. We propose a string matchin...
متن کاملKeyword spotting in unconstrained handwritten Chinese documents using contextual word model
a r t i c l e i n f o Keywords: Keyword spotting Chinese handwritten documents Word similarity Contextual word model This paper proposes a method for keyword spotting in off-line Chinese handwritten documents using a contextual word model, which measures the similarity between the query word and every candidate word in the document by combining a character classifier and the geometric context a...
متن کاملRadial Line Fourier Descriptor for Segmentation-free Handwritten Word Spotting
Automatic recognition of historical handwritten manuscripts is a daunting task due to paper degradation over time. Recognition-free retrieval or word spotting is popularly used for information retrieval and digitization of the historical handwritten documents. However, the performance of word spotting algorithms depends heavily on feature detection and representation methods. Although there exi...
متن کاملSegmentation-Based And Segmentation-Free Methods for Spotting Handwritten Arabic Words
Given a set of handwritten documents, a common goal is to search for a relevant subset. Attempting to find a query word or image in such a set of documents is called word spotting. Spotting handwritten words in documents written in the Latin alphabet, and more recently in Arabic, has received considerable attention. One issue is generating candidate word regions on a page. Attempting to definit...
متن کاملLexicon-free handwritten word spotting using character HMMs
For retrieving keywords from scanned handwritten documents, we present a word spotting system that is based on character Hidden Markov Models. In an efficient lexicon-free approach, arbitrary keywords can be spotted without pre-segmenting text lines into words. For a multi-writer scenario on the IAM off-line database as well as for two single writer scenarios on historical data sets, it is show...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007